Invited Talk: Building Language Resources: Ways to move forward
نویسندگان
چکیده
There are perhaps seven thousand languages in the world, ranging from the largest with hundreds of millions of speakers, to the smallest, with one speaker. On a different axis, languages can be ranked according to the quantity and quality of computational resources. Not surprisingly, there are correlations between these two axes: languages like English and Mandarin have substantial resources, while many of the smallest languages are completely undocumented. Nevertheless, the correlation is not perfect; there are languages with a million speakers which are more or less unwritten, and there are very large languages – some of the languages of India, for example – which are relatively resource-poor. Unfortunately, what counts as resource-rich (or even resource-adequate) in computational linguistics is a moving target. For languages to move in the direction of resource richness, considerable effort (people and money) have to be provided over a prolonged period of time. One can sit back and wait for this to happen, or give up; alternatively, one can map out a realistic way forward, building on the strengths of each language’s situation. Among the strengths which may prove useful to building computational resources for languages are the following: • Long traditions of grammatical and lexical description • Traditions of literacy and literature • Local expertise in linguistics and computing • The world-wide community of linguists and computer experts • Resource availability in related languages At the same time, there are weaknesses and other problems – some language specific, some more general – which need to be considered: • Lack of consensus on ways of representing the language (scripts, character encoding) • Complexities inherent in particular languages (complex scripts, complex morphologies, variant orthographies, diglossia, dialectal variation) • Economic and educational realities in the countries where the language is spoken • Political attitudes towards some languages, particularly minority languages • The 'not invented here' syndrome • Software obsolescence, and the potential obsolescence of language data This talk will look at ways in which the strengths enumerated above might be leveraged, while avoiding the potential weaknesses.
منابع مشابه
EFL Classroom Discourse in Iranian Context: Investigating Teacher Talk Adaptation to Students’ Proficiency Level
How language teachers talk is a key factor in organizing and facilitating learning specifically in language classrooms where the medium of instruction is also the subject matter. This study aimed to examine the extent and ways of teacher talk adaptation to students’ proficiency levels in the Iranian EFL context. Two EFL teachers who were teaching three different proficiency levels were observed...
متن کامل"But let me talk": An Investigation into Teachers' Interaction Patterns in EFL Classrooms
Drawing on Walsh's (2012) idea that boosting learners' contribution and interaction can play a key role in their foreign language learning, this mixed-methods study tried to cast some light on the ways by which teachers, via their choice and use of language, create or block learners' contribution in direct interactions in the classroom. A total of 800-minute recordings of 10 teachers' talks and...
متن کاملInvited Talk: Breaking the Zipfian Barrier of NLP
We know that the distribution of most of the linguistic entities (e.g. phones, words, grammar rules) follow a power law or the Zipf's law. This makes NLP hard. Interestingly, the distribution of speakers over the world, content over the web and linguistic resources available across languages also follow power law. However, the correlation between the distribution of number of speakers to that o...
متن کاملMove-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts
Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...
متن کاملThe Role of Educational Context in Influencing EFL Teachers’ Sense of Efficacy Beliefs
Teachers’ sense of efficacy belief has been introduced as a context-specific construct, but the related literature is not clear on this specificity. This study was an attempt to show how contextual factors influence efficacy beliefs among English language teachers. To this end, thirty Iranian EFL teachers working in both school and private institute contexts were chosen as the participants to r...
متن کامل